AITopics | natural policy gradient method

Collaborating Authors

natural policy gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Review for NeurIPS paper: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Neural Information Processing SystemsJan-24-2025, 13:09:12 GMT

Additional Feedback: Detailed comments on a section-by-section basis are given below. Lines 10-13 of abstract: I found this quite vaguely worded, and didn't understand which contributions in the main paper this is referring to. Section 1 "Policy gradients are usually estimated via Monte-Carlo rollouts". To me, this would suggest that critics are not used, or are used at most as baselines. However, as far as I am aware, most of the successful methods cited in the previous paragraph do use bootstrapping, and so are not pure Monte Carlo, in the sense that the term is used in RL.

improved analysis, natural policy gradient method, variance-reduced, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Neural Information Processing SystemsOct-10-2024, 05:56:47 GMT

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied on NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Thanks to this improvement, we have also made variance-reduction for NPG possible for the first time, with both global convergence and an efficient finite-sample complexity.

improved analysis, natural policy gradient method, variance-reduced, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Müller, Johannes, Cayci, Semih

arXiv.org Artificial IntelligenceJun-25-2024

We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problemspecific exponent. We provide a lower bound matching our upper bound up to a polynomial term. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this particular gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropyregularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.

convergence, gradient flow, markov decision process, (12 more...)

arXiv.org Artificial Intelligence

2406.04163

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

Duan, Jingliang, Li, Jie, Chen, Xuyang, Zhao, Kai, Li, Shengbo Eben, Zhao, Lin

arXiv.org Artificial IntelligenceOct-29-2023

In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.

convergence, gradient method, policy gradient method, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCYB.2023.3323316

2310.19022

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Singapore > Central Region > Singapore (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Control Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Hambly, Ben, Xu, Renyuan, Yang, Huining

arXiv.org Machine LearningJul-27-2021

Policy optimization algorithms have achieved substantial empirical successes in addressing a variety of non-cooperative multi-agent problems, including self-driving vehicles [17], real-time bidding games [8], and optimal execution in financial markets [6]. However, there have been few results from a theoretical perspective showing why such a class of reinforcement learning algorithms performs well with the presence of competition among agents. As a starting point to tackle this challenging problem, we investigate linear-quadratic games (LQGs) which can be seen as a generalization of the linear-quadratic regulator (LQR) from a single agent to multiple agents. In an LQG, all agents jointly control a linear state process, which may be in high dimensions, where the control (or action) from each individual agent has a linear impact on the state process. Each agent optimizes a quadratic cost function which depends on the state process, the control from this agent and/or the controls from the opponents.

gradient method, lemma 3, natural policy gradient algorithm, (11 more...)

arXiv.org Machine Learning

2107.1309

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks

Miyamae, Atsushi, Nagata, Yuichi, Ono, Isao, Kobayashi, Shigenobu

Neural Information Processing SystemsDec-31-2010

In this paper, we propose an efficient algorithm for estimating the natural policy gradient with parameter-based exploration; this algorithm samples directly in the parameter space. Unlike previous methods based on natural gradients, our algorithm calculates the natural policy gradient using the inverse of the exact Fisher information matrix. The computational cost of this algorithm is equal to that of conventional policy gradients whereas previous natural policy gradient methods have a prohibitive computational cost. Experimental results show that the proposed method outperforms several policy gradient methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback